AITopics | toxic content

Collaborating Authors

toxic content

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

a6b99249d04f982fd2f7d5b2506bf541-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-17-2026, 05:41:08 GMT

beancounter, large language model, machine learning, (22 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
(24 more...)

Genre:

Financial News (1.00)
Research Report > New Finding (0.45)

Industry:

Law Enforcement & Public Safety (1.00)
Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Information Management (0.92)
(5 more...)

Add feedback

Teaching Models to Understand (but not Generate) High-risk Data

Wang, Ryan, Finlayson, Matthew, Soldaini, Luca, Swayamdipta, Swabha, Jia, Robin

arXiv.org Artificial IntelligenceOct-16-2025

Language model developers typically filter out high-risk content -- such as toxic or copyrighted text -- from their pre-training data to prevent models from generating similar outputs. However, removing such data altogether limits models' ability to recognize and appropriately respond to harmful or sensitive content. In this paper, we introduce Selective Loss to Understand but Not Generate (SLUNG), a pre-training paradigm through which models learn to understand high-risk data without learning to generate it. Instead of uniformly applying the next-token prediction loss, SLUNG selectively avoids incentivizing the generation of high-risk tokens while ensuring they remain within the model's context window. As the model learns to predict low-risk tokens that follow high-risk ones, it is forced to understand the high-risk content. Through our experiments, we show that SLUNG consistently improves models' understanding of high-risk data (e.g., ability to recognize toxic content) without increasing its generation (e.g., toxicity of model responses). Overall, our SLUNG paradigm enables models to benefit from high-risk text that would otherwise be filtered out.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.03052

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Education > Instructional Theory (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)

Add feedback

a6b99249d04f982fd2f7d5b2506bf541-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsOct-10-2025, 12:24:43 GMT

beancounter, computational linguistic, dataset, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
(24 more...)

Genre:

Financial News (1.00)
Research Report > New Finding (0.45)

Industry:

Law Enforcement & Public Safety (1.00)
Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(11 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Information Management (0.92)
(5 more...)

Add feedback

Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages

Hu, Yujia, Hee, Ming Shan, Nakov, Preslav, Lee, Roy Ka-Wei

arXiv.org Artificial IntelligenceSep-24-2025

The advancement of Large Language Models (LLMs) has transformed natural language processing; however, their safety mechanisms remain under-explored in low-resource, multilingual settings. Here, we aim to bridge this gap. In particular, we introduce \textsf{SGToxicGuard}, a novel dataset and evaluation framework for benchmarking LLM safety in Singapore's diverse linguistic context, including Singlish, Chinese, Malay, and Tamil. SGToxicGuard adopts a red-teaming approach to systematically probe LLM vulnerabilities in three real-world scenarios: \textit{conversation}, \textit{question-answering}, and \textit{content composition}. We conduct extensive experiments with state-of-the-art multilingual LLMs, and the results uncover critical gaps in their safety guardrails. By offering actionable insights into cultural sensitivity and toxicity mitigation, we lay the foundation for safer and more inclusive AI systems in linguistically diverse environments.\footnote{Link to the dataset: https://github.com/Social-AI-Studio/SGToxicGuard.} \textcolor{red}{Disclaimer: This paper contains sensitive content that may be disturbing to some readers.}

computational linguistic, large language model, machine learning, (21 more...)

arXiv.org Artificial Intelligence

2509.1526

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Singapore (0.63)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government (0.68)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Defining, Understanding, and Detecting Online Toxicity: Challenges and Machine Learning Approaches

Shahi, Gautam Kishore, Majchrzak, Tim A.

arXiv.org Artificial IntelligenceSep-19-2025

Online toxic content has grown into a pervasive phenomenon, intensifying during times of crisis, elections, and social unrest. A significant amount of research has been focused on detecting or analyzing toxic content using machine-learning approaches. The proliferation of toxic content across digital platforms has spurred extensive research into automated detection mechanisms, primarily driven by advances in machine learning and natural language processing. Overall, the present study represents the synthesis of 140 publications on different types of toxic content on digital platforms. We present a comprehensive overview of the datasets used in previous studies focusing on definitions, data sources, challenges, and machine learning approaches employed in detecting online toxicity, such as hate speech, offensive language, and harmful discourse. The dataset encompasses content in 32 languages, covering topics such as elections, spontaneous events, and crises. We examine the possibility of using existing cross-platform data to improve the performance of classification models. We present the recommendations and guidelines for new research on online toxic consent and the use of content moderation for mitigation. Finally, we present some practical guidelines to mitigate toxic content from online platforms.

large language model, machine learning, toxic content, (18 more...)

arXiv.org Artificial Intelligence

2509.14264

Country:

Europe (1.00)
Asia (1.00)
North America > United States (0.68)

Genre:

Overview (1.00)
Research Report > New Finding (0.66)

Industry:

Media > News (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
Law (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Towards Safer Pretraining: Analyzing and Filtering Harmful Content in Webscale datasets for Responsible LLMs

Mendu, Sai Krishna, Yenala, Harish, Gulati, Aditi, Kumar, Shanu, Agrawal, Parag

arXiv.org Artificial IntelligenceAug-14-2025

Large language models (LLMs) have become integral to various real-world applications, leveraging massive, web-sourced datasets like Common Crawl, C4, and FineWeb for pretraining. While these datasets provide linguistic data essential for high-quality natural language generation, they often contain harmful content, such as hate speech, misinformation, and biased narratives. Training LLMs on such unfiltered data risks perpetuating toxic behaviors, spreading misinformation, and amplifying societal biases which can undermine trust in LLM-driven applications and raise ethical concerns about their use. This paper presents a large-scale analysis of inappropriate content across these datasets, offering a comprehensive taxonomy that categorizes harmful webpages into Topical and Toxic based on their intent. We also introduce a prompt evaluation dataset, a high-accuracy Topical and Toxic Prompt (TTP), and a transformer-based model (HarmFormer) for harmful content filtering. Additionally, we create a new multi-harm open-ended toxicity benchmark (HA VOC) and provide crucial insights into how models respond to adversarial toxic inputs. Our work offers insights into ensuring safer LLM pretraining and serves as a resource for Responsible AI (RAI) compliance. Disclaimer: This paper includes potentially offensive content due to the nature of the research.

category, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2505.02009

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Media > News (0.87)
Law (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)
Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Text Detoxification: Data Efficiency, Semantic Preservation and Model Generalization

Yu, Jing, Zhao, Yibo, Zhu, Jiapeng, Shao, Wenming, Pang, Bo, Zhang, Zhao, Li, Xiang

arXiv.org Artificial IntelligenceJul-8-2025

The widespread dissemination of toxic content on social media poses a serious threat to both online environments and public discourse, highlighting the urgent need for detoxification methods that effectively remove toxicity while preserving the original semantics. However, existing approaches often struggle to simultaneously achieve strong detoxification performance, semantic preservation, and robustness to out-of-distribution data. Moreover, they typically rely on costly, manually annotated parallel corpora while showing poor data efficiency. To address these challenges, we propose a two-stage training framework that jointly optimizes for data efficiency, semantic preservation, and model generalization. We first perform supervised fine-tuning on a small set of high-quality, filtered parallel data to establish a strong initialization. Then, we leverage unlabeled toxic inputs and a custom-designed reward model to train the LLM using Group Relative Policy Optimization. Experimental results demonstrate that our method effectively mitigates the trade-offs faced by previous work, achieving state-of-the-art performance with improved generalization and significantly reduced dependence on annotated data. Our code is available at: https://github.com/allacnobug/Detoxification-of-Text.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.0105

Country: Asia (0.68)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

LLM in the Loop: Creating the ParaDeHate Dataset for Hate Speech Detoxification

Yuan, Shuzhou, Nie, Ercong, Kouba, Lukas, Kangen, Ashish Yashwanth, Schmid, Helmut, Schütze, Hinrich, Färber, Michael

arXiv.org Artificial IntelligenceJun-9-2025

Detoxification, the task of rewriting harmful language into non-toxic text, has become increasingly important amid the growing prevalence of toxic content online. However, high-quality parallel datasets for detoxification, especially for hate speech, remain scarce due to the cost and sensitivity of human annotation. In this paper, we propose a novel LLM-in-the-loop pipeline leveraging GPT-4o-mini for automated detoxification. We first replicate the ParaDetox pipeline by replacing human annotators with an LLM and show that the LLM performs comparably to human annotation. Building on this, we construct ParaDeHate, a large-scale parallel dataset specifically for hatespeech detoxification. We release ParaDeHate as a benchmark of over 8K hate/non-hate text pairs and evaluate a wide range of baseline methods. Experimental results show that models such as BART, fine-tuned on ParaDeHate, achieve better performance in style accuracy, content preservation, and fluency, demonstrating the effectiveness of LLM-generated detoxification text as a scalable alternative to human annotation.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2506.01484

Country:

Asia (1.00)
Europe (0.93)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

ViToSA: Audio-Based Toxic Spans Detection on Vietnamese Speech Utterances

Do, Huy Ba, Huynh, Vy Le-Phuong, Nguyen, Luan Thanh

arXiv.org Artificial IntelligenceJun-3-2025

Toxic speech on online platforms is a growing concern, impacting user experience and online safety. While text-based toxicity detection is well-studied, audio-based approaches remain underexplored, especially for low-resource languages like Vietnamese. This paper introduces ViToSA (Vietnamese Toxic Spans Audio), the first dataset for toxic spans detection in Vietnamese speech, comprising 11,000 audio samples (25 hours) with accurate human-annotated transcripts. We propose a pipeline that combines ASR and toxic spans detection for fine-grained identification of toxic content. Our experiments show that fine-tuning ASR models on ViToSA significantly reduces WER when transcribing toxic speech, while the text-based toxic spans detection (TSD) models outperform existing baselines. These findings establish a novel benchmark for Vietnamese audio-based toxic spans detection, paving the way for future research in speech content moderation.

detection, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.00636

Country:

Asia (1.00)
North America > Mexico (0.28)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.70)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.31)

Add feedback

Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings

Yang, Shujian, Cui, Shiyao, Hu, Chuanrui, Wang, Haicheng, Zhang, Tianwei, Huang, Minlie, Lu, Jialiang, Qiu, Han

arXiv.org Artificial IntelligenceJun-2-2025

Detecting toxic content using language models is important but challenging. While large language models (LLMs) have demonstrated strong performance in understanding Chinese, recent studies show that simple character substitutions in toxic Chinese text can easily confuse the state-of-the-art (SOTA) LLMs. In this paper, we highlight the multimodal nature of Chinese language as a key challenge for deploying LLMs in toxic Chinese detection. First, we propose a taxonomy of 3 perturbation strategies and 8 specific approaches in toxic Chinese content. Then, we curate a dataset based on this taxonomy, and benchmark 9 SOTA LLMs (from both the US and China) to assess if they can detect perturbed toxic Chinese text. Additionally, we explore cost-effective enhancement solutions like in-context learning (ICL) and supervised fine-tuning (SFT). Our results reveal two important findings. (1) LLMs are less capable of detecting perturbed multimodal Chinese toxic contents. (2) ICL or SFT with a small number of perturbed examples may cause the LLMs "overcorrect'': misidentify many normal Chinese contents as toxic.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.24341

Country: Asia > China (0.49)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (0.93)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback